Dialect classification via discriminative training

نویسندگان

  • Yun Lei
  • John H. L. Hansen
چکیده

Variability in speech due to dialect is a major factor limiting speech system performance for speech recognition, spoken document retrieval, and dialog systems. In this study, we propose a novel discriminative algorithm to improve dialect classification for unsupervised spontaneous speech in Arabic. No transcripts are used for either training or testing, and all data are spontaneous speech. The Gaussian mixture model (GMM) is used as our baseline system for dialect classification. The major motivation is to remove confused/distractive regions from the dialect acoustic space, while emphasizing discriminative/sensitive information. The Kullback-Leibler divergence is used to find the most discriminative GMM mixtures (KLD-GMM), after which the confused acoustic GMM region is removed. The proposed algorithm is evaluated on three dialects of Arabic, with measurable improvement achieved (4.0%), over a generalized maximum likelihood estimation GMM baseline (MLE-GMM) system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discriminative n-gram selection for dialect recognition

Dialect recognition is a challenging and multifaceted problem. Distinguishing between dialects can rely upon many tiers of interpretation of speech data—e.g., prosodic, phonetic, spectral, and word. High-accuracy automatic methods for dialect recognition typically use either phonetic or spectral characteristics of the input. A challenge with spectral system, such as those based on shifted-delta...

متن کامل

Gaussian Mixture Selection and Data Selection for Unsupervised Spanish Dialect Classification

Automatic dialect classification has gained interests in the field of speech research because it is important to characterize speaker traits and to estimate knowledge that could improve integrated speech technology (e.g., speech recognition, speaker recognition). This study addresses novel advances in unsupervised spontaneous Latin American Spanish dialect classification. The problem considers ...

متن کامل

Advances in Word based Dialect/

In an earlier study, we proposed a very effective dialect/accent classification algorithm, which is named Word based Dialect Classification (WDC). The WDC works well for large size corpora and significantly outperforms traditional Large Vocabulary Continuous Speech Recognition (LVCSR) based systems, which is claimed to be the best performing system for language identification. For a small train...

متن کامل

دو روش تبدیل ویژگی مبتنی بر الگوریتم های ژنتیک برای کاهش خطای دسته بندی ماشین بردار پشتیبان

Discriminative methods are used for increasing pattern recognition and classification accuracy. These methods can be used as discriminant transformations applied to features or they can be used as discriminative learning algorithms for the classifiers. Usually, discriminative transformations criteria are different from the criteria of  discriminant classifiers training or  their error. In this ...

متن کامل

Word-Based Dialect Identification with Georeferenced Rules

We present a novel approach for (written) dialect identification based on the discriminative potential of entire words. We generate Swiss German dialect words from a Standard German lexicon with the help of hand-crafted phonetic/graphemic rules that are associated with occurrence maps extracted from a linguistic atlas created through extensive empirical fieldwork. In comparison with a character...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008